94 research outputs found

    Multi-Tenant Virtual GPUs for Optimising Performance of a Financial Risk Application

    Get PDF
    Graphics Processing Units (GPUs) are becoming popular accelerators in modern High-Performance Computing (HPC) clusters. Installing GPUs on each node of the cluster is not efficient resulting in high costs and power consumption as well as underutilisation of the accelerator. The research reported in this paper is motivated towards the use of few physical GPUs by providing cluster nodes access to remote GPUs on-demand for a financial risk application. We hypothesise that sharing GPUs between several nodes, referred to as multi-tenancy, reduces the execution time and energy consumed by an application. Two data transfer modes between the CPU and the GPUs, namely concurrent and sequential, are explored. The key result from the experiments is that multi-tenancy with few physical GPUs using sequential data transfers lowers the execution time and the energy consumed, thereby improving the overall performance of the application.Comment: Accepted to the Journal of Parallel and Distributed Computing (JPDC), 10 June 201

    Acceleration-as-a-Service: Exploiting Virtualised GPUs for a Financial Application

    Get PDF
    'How can GPU acceleration be obtained as a service in a cluster?' This question has become increasingly significant due to the inefficiency of installing GPUs on all nodes of a cluster. The research reported in this paper is motivated to address the above question by employing rCUDA (remote CUDA), a framework that facilitates Acceleration-as-a-Service (AaaS), such that the nodes of a cluster can request the acceleration of a set of remote GPUs on demand. The rCUDA framework exploits virtualisation and ensures that multiple nodes can share the same GPU. In this paper we test the feasibility of the rCUDA framework on a real-world application employed in the financial risk industry that can benefit from AaaS in the production setting. The results confirm the feasibility of rCUDA and highlight that rCUDA achieves similar performance compared to CUDA, provides consistent results, and more importantly, allows for a single application to benefit from all the GPUs available in the cluster without loosing efficiency.Comment: 11th IEEE International Conference on eScience (IEEE eScience) - Munich, Germany, 201

    GPU-Job Migration: The rCUDA Case

    Full text link
    © 2019 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] Virtualization techniques have been shown to report benefits to data centers and other computing facilities. In this regard, not only virtual machines allow to reduce the size of the computing infrastructure while increasing overall resource utilization, but also virtualizing individual components of computers may provide significant benefits. This is the case, for instance, for the remote GPU virtualization technique, implemented in several frameworks during the recent years. The large degree of flexibility provided by the remote GPU virtualization technique can be further increased by applying the migration mechanism to it, so that the GPU part of applications can be live-migrated to another GPU elsewhere in the cluster during execution time in a transparent way. In this paper we present the implementation of the migration mechanism within the rCUDA remote GPU virtualization middleware. Furthermore, we present a thorough performance analysis of the implementation of the migration mechanism within rCUDA. To that end, we leverage both synthetic and real production applications as well as three different generations of NVIDIA GPUs. Additionally, two different versions of the InfiniBand interconnect are used in this study. Several use cases are provided in order to show the extraordinary benefits that the GPU-job migration mechanism can report to data centers.This work was funded by the Generalitat Valenciana under Grant PROMETEO/2017/77. Authors are grateful for the generous support provided by Mellanox Technologies Inc.Prades, J.; Silla Jiménez, F. (2019). GPU-Job Migration: The rCUDA Case. IEEE Transactions on Parallel and Distributed Systems. 30(12):2718-2729. https://doi.org/10.1109/TPDS.2019.292443327182729301

    A performance comparison of CUDA remote GPU virtualization frameworks

    Full text link
    © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Using GPUs reduces execution time of many applications but increases acquisition cost and power consumption. Furthermore, GPUs usually attain a relatively low utilization. In this context, remote GPU virtualization solutions were recently created to overcome the drawbacks of using GPUs. Currently, many different remote GPU virtualization frameworks exist, all of them presenting very different characteristics. These differences among them may lead to differences in performance. In this work we present a performance comparison among the only three CUDA remote GPU virtualization frameworks publicly available at no cost. Results show that performance greatly depends on the exact framework used, being the rCUDA virtualization solution the one that stands out among them. Furthermore, rCUDA doubles performance over CUDA for pageable memory copies.This work was funded by the Generalitat Valenciana under Grant PROMETEOII/2013/009 of the PROMETEO program phase II. Authors are also grateful for the generous support provided by Mellanox TechnologiesReaño González, C.; Silla Jiménez, F. (2015). A performance comparison of CUDA remote GPU virtualization frameworks. IEEE. https://doi.org/10.1109/CLUSTER.2015.76

    Tuning remote GPU virtualization for InfiniBand networks

    Full text link
    The final publication is available at Springer via http://dx.doi.org/ 10.1007/s11227-016-1754-3In the past few years, a tendency towards using InfiniBand networks to interconnect high performance computing clusters can be observed. Thus, most of the supercomputers appearing in the TOP500 list either use Ethernet or InfiniBand interconnects. Regarding the latter, the complexity of the InfiniBand programming API (i.e., InfiniBand Verbs) makes it difficult for applications to get the maximum performance of these networks. In this paper we expose how we have tuned a remote GPU virtualization framework whose communications module is implemented using InfiniBand Verbs. The net result is a noticeable increase in the performance of this framework, significantly reducing the gap between remote and local GPUs.This work was funded by the Spanish MINECO and FEDER funds under Grant TIN2012-38341-C04-01. Authors are also grateful for the generous support provided by Mellanox Technologies.Reaño González, C.; Silla Jiménez, F. (2016). Tuning remote GPU virtualization for InfiniBand networks. Journal of Supercomputing. 72(12):4520-4545. https://doi.org/10.1007/s11227-016-1754-3S452045457212InfiniBand Trade Association (IBTA) (2015) [Online]. http://www.infinibandta.orgDAmbrosia J (2014) Ethernet in the TOP500 [Online]. http://www.scientificcomputing.com/blogs/2014/07/ethernet-top500TOP500 Supercomputer Sites (2014) [Online]. http://www.top500.org/InfiniBand Trade Association (IBTA) (2007) The InfiniBand Trade Association SpecificationKerr G (2011) Dissecting a small infiniband application using the verbs API. CoRR abs/1105.1827 [Online]. arxiv:1105.1827Woodruff B, Hefty S, Dreier R, Rosenstock H (2005) Introduction to the infiniband core software. In: Linux symposium, vol 2Bedeir T (2010) Building an RDMA-capable application with ib verbs, Technical report, HPC Advisory Council, Tech. Rep., 2010. http://www.hpcadvisorycouncil.com/pdf/building-an-rdma-capable-application-with-ib-verbs.pdfLiu Q, Russell RD (2014) A performance study of infiniband fourteen data rate (fdr). In: Proceedings of the High performance computing symposium, ser. HPC ’14. San Diego, CA, USA: Society for Computer Simulation International, 2014, pp 16:1–16:10 [Online]. http://dl.acm.org/citation.cfm?id=2663510.2663526Hjelm N (2014) Optimizing one-sided operations in open mpi. In: Proceedings of the 21st European MPI Users’ Group Meeting, ser. EuroMPI/ASIA ’14. New York, NY, USA: ACM, 2014, pp 123:123–123:124 [Online]. http://doi.acm.org/10.1145/2642769.2642792Subramoni H, Hamidouche K, Venkatesh A, Chakraborty S, Panda D (2014) Designing mpi library with dynamic connected transport (dct) of infiniband: Early experiences. In: Kunkel J , Ludwig T, Meuer H (eds) Supercomputing, ser. lecture notes in computer science. Springer International Publishing, 2014, vol 8488, pp 278–295 [Online]. doi: 10.1007/978-3-319-07518-1_18Unified Communication X (UCX), 2015 [Online]. http://www.openucx.orgNVIDIA (2014) CUDA C Programming Guide 6.5Peña AJ, Reaño C, Silla F, Mayo R, Quintana-Ortí ES, Duato J (2014) A complete and efficient cuda-sharing solution for hpc clusters. Parallel Comput 40(10):574– 588 [Online]. http://www.sciencedirect.com/science/article/pii/S0167819114001227Reaño C, Silla F, Gimeno AC, Peña AJ, Mayo R, Quintana-Ortí ES, Duato J (2015) Improving the user experience of the rcuda remote GPU virtualization framework. Concurr Comput Pract Exp 27(14)3746–3770 [Online]. doi: 10.1002/cpe.3409Prades J, Reaño C, Silla F (2016) Flexible access to CUDA accelerators from Xen virtual machines in InfiniBand clusters using rCUDA. In: 21st ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP 2016Iserte S, Gimeno AC, Mayo R, Quintana-Ortí ES, Silla F, Duato J, Reaño C, Prades J (2014) SLURM support for remote GPU virtualization: implementation and performance study. In: 26th IEEE international symposium on computer architecture and high performance computing, SBAC-PAD, 2014, pp 318–325 [Online]. doi: 10.1109/SBAC-PAD.2014.49NVIDIA (2014) NVIDIA CUDA Samples 6.5Che S, Boyer M, Meng J, Tarjan D, Sheaffer J, Lee S-H, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: Workload Characterization, 2009. IISWC 2009. IEEE international symposium on, 2009, pp 44–54University of Tennessee, MAGMA: matrix algebra on GPU and multicore architectures [Online]. http://icl.cs.utk.edu/magmaBosma W, Cannon J, Playoust C (1997) The Magma algebra system. I. The user language. Computational algebra and number theory (London, 1993). J Symbol Comput 24(3–4) 235–265 [Online]. doi: 10.1006/jsco.1996.0125GROMACS web page (2014 ) [Online]. http://www.gromacs.org/Pronk S, Pll S, Schulz R, Larsson P, Bjelkmar P, Apostolov R, Shirts MR, Smith JC, Kasson PM, van der Spoel D, Hess B, Lindahl E (2013) Gromacs 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29(7)845–854 [Online]. http://bioinformatics.oxfordjournals.org/content/29/7/845.abstractBrown WM, Kohlmeyer A, Plimpton SJ, Tharrington AN (2012) Implementing molecular dynamics on hybrid high performance computers: particle–particle particle–mesh. Comp Phys Commun 183(3):449–459Athanasopoulos A, Dimou A, Mezaris V, Kompatsiaris I (2011) GPU acceleration for support vector machines. In: 12th international workshop on image analysis for multimedia interactive services (WIAMIS

    Reducing the Costs of Teaching CUDA in Laboratories while Maintaining the Learning Experience Quality

    Full text link
    Graphics Processing Units (GPUs) have become widely used to accelerate scientific applications; therefore, it is important that Computer Science and Computer Engineering curricula include the fundamentals of parallel computing with GPUs. Regarding the practical part of the training, one important concern is how to introduce GPUs into a laboratory: installing GPUs in all the computers of the lab may not be affordable, while sharing a remote GPU server among several students may result in a poor learning experience because of its associated overhead. In this paper we propose a solution to address this problem: the use of the rCUDA (remote CUDA) middleware, which enables programs being executed in a computer to make concurrent use of GPUs located in remote servers. Hence, students would be able to concurrently and transparently share a single remote GPU from their local machines in the laboratory without having to log into the remote server. In order to demonstrate that our proposal is feasible, we present results of a real scenario. The results show that the cost of the laboratory is noticeably reduced while the learning experience quality is maintained.Reaño González, C.; Silla Jiménez, F. (2015). Reducing the Costs of Teaching CUDA in Laboratories while Maintaining the Learning Experience Quality. En INTED2015 Proceedings. IATED. 3651-3660. http://hdl.handle.net/10251/70229S3651366

    InfiniBand verbs optimizations for remote GPU virtualization

    Full text link
    © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The use of InfiniBand networks to interconnect high performance computing clusters has considerably increased during the last years. So much so that the majority of the supercomputers included in the TOP500 list either use Ethernet or InfiniBand interconnects. Regarding the latter, due to the complexity of the InfiniBand programming API (i.e., InfiniBand Verbs) and the lack of documentation, there are not enough recent available studies explaining how to optimize applications to get the maximum performance from this fabric. In this paper we expose two different optimizations to be used when developing applications using InfiniBand Verbs, each providing an average bandwidth improvement of 3.68% and 217.14%, respectively. In addition, we show that when combining both optimizations, the average bandwidth gain is 43.29%. This bandwidth increment is key for remote GPU virtualization frameworks. Actually, this noticeable gain translates into a reduction of up to 35% in execution time of applications using remote GPU virtualization frameworks.This work was funded by the Generalitat Valenciana under Grant PROMETEOII/2013/009 of the PROMETEO program phase II. Authors are also grateful for the generous support provided by Mellanox TechnologiesReaño González, C.; Silla Jiménez, F. (2015). InfiniBand verbs optimizations for remote GPU virtualization. IEEE. https://doi.org/10.1109/CLUSTER.2015.139

    Exploring the use of data compression for accelerating machine learning in the edge with remote virtual graphics processing units

    Full text link
    [EN] Internet of Things (IoT) devices are usually low performance nodes connected by low bandwidth networks. To improve performance in such scenarios, some computations could be done at the edge of the network. However, edge devices may not have enough computing power to accelerate applications such as the popular machine learning ones. Using remote virtual graphics processing units (GPUs) can address this concern by accelerating applications leveraging a GPU installed in a remote device. However, this requires exchanging data with the remote GPU across the slow network. To address the problem with the slow network, the data to be exchanged with the remote GPU could be compressed. In this article, we explore the suitability of using data compression in the context of remote GPU virtualization frameworks in edge scenarios executing machine learning applications. We use popular machine learning applications to carry out such exploration. After characterizing the GPU data transfers of these applications, we analyze the usage of existing compression libraries for compressing those data transfers to/from the remote GPU. Our exploration shows that transferring compressed data becomes more beneficial as networks get slower, reducing transfer time by up to 10 times. Our analysis also reveals that efficient integration of compression into remote GPU virtualization frameworks is strongly required.European Union's Horizon 2020 Research and Innovation Programme, Grant/Award Numbers: 101016577, 101017861.Peñaranda-Cebrián, C.; Reaño, C.; Silla, F. (2022). Exploring the use of data compression for accelerating machine learning in the edge with remote virtual graphics processing units. Concurrency and Computation: Practice and Experience. 35(20):1-19. https://doi.org/10.1002/cpe.7328119352

    On the Effect of using rCUDA to Provide CUDA Acceleration to Xen Virtual Machines

    Full text link
    [EN] Nowadays, many data centers use virtual machines (VMs) in order to achieve a more efficient use of hardware resources. The use of VMs provides a reduction in equipment and maintenance expenses as well as a lower electricity consumption. Nevertheless, current virtualization solutions, such as Xen, do not easily provide graphics processing units (GPUs) to applications running in the virtualized domain with the flexibility usually required in data centers (i.e., managing virtual GPU instances and concurrently sharing them among several VMs). Therefore, the execution of GPU-accelerated applications within VMs is hindered by this lack of flexibility. In this regard, remote GPU virtualization solutions may address this concern. In this paper we analyze the use of the remote GPU virtualization mechanism to accelerate scientific applications running inside Xen VMs. We conduct our study with six different applications, namely CUDA-MEME, CUDASW++, GPU-BLAST, LAMMPS, a triangle count application, referred to as TRICO, and a synthetic benchmark used to emulate different application behaviors. Our experiments show that the use of remote GPU virtualization is a feasible approach to address the current concerns of sharing GPUs among several VMs, featuring a very low overhead if an InfiniBand fabric is already present in the cluster.This work was funded by the Generalitat Valenciana under Grant PROMETEO/2017/077. Authors are also grateful for the generous support provided by Mellanox Technologies Inc.Prades, J.; Reaño González, C.; Silla Jiménez, F. (2019). On the Effect of using rCUDA to Provide CUDA Acceleration to Xen Virtual Machines. Cluster Computing. 22(1):185-204. https://doi.org/10.1007/s10586-018-2845-0185204221Kernel-Based Virtual Machine, KVM. http://www.linux-kvm.org (2015). Accessed 19 Oct 2015Xen Project. http://www.xenproject.org/ (2015). Accessed 19 Oct 2015VMware Virtualization. http://www.vmware.com/ (2015). Accessed 19 Oct 2015Oracle VM VirtualBox. http://www.virtualbox.org/ (2015). Accessed 19 Oct 2015Semnanian, A., Pham, J., Englert, B., Wu, X.: Virtualization technology and its impact on computer hardware architecture. In: Proceedings of the Information Technology: New Generations, ITNG, pp. 719–724 (2011)Felter, W., Ferreira, A., Rajamony, R., Rubio, J.: An updated performance comparison of virtual machines and linux containers. In: IBM Research Report (2014)Zhang, J., Lu, X., Arnold, M., Panda, D.: MVAPICH2 over OpenStack with SR-IOV: an efficient approach to build HPC Clouds. In: Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid, pp. 71–80 (2015)Wu, H., Diamos, G., Sheard, T., Aref, M., Baxter, S., Garland, M., Yalamanchili, S.: Red Fox: an execution environment for relational query processing on GPUs. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO (2014)Playne, D.P., Hawick, K.A.: Data parallel three-dimensional Cahn-Hilliard field equation simulation on GPUs with CUDA. In: Proceedings of the Parallel and Distributed Processing Techniques and Applications, PDPTA, pp. 104–110 (2009)Yamazaki, I., Dong, T., Solcà, R., Tomov, S., Dongarra, J., Schulthess, T.: Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems. Concurr. Comput.: Pract. Exp. 26(16), 2652–2666 (2014)Luo, D.Y.: Canny edge detection on NVIDIA CUDA. In: Proceedings of the Computer Vision and Pattern Recognition Workshops, CVPR Workshops, pp. 1–8 (2008)Surkov, V.: Parallel option pricing with Fourier space time-stepping method on graphics processing units. Parallel Comput. 36(7), 372–380 (2010)Agarwal, P.K., Hampton, S., Poznanovic, J., Ramanthan, A., Alam, S.R., Crozier, P.S.: Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures. Concurr. Comput.: Pract. Exp. 25(10), 1356–1375 (2013)Luo, G.H., Huang, S.K., Chang, Y.S., Yuan, S.M.: A parallel bees algorithm implementation on GPU. J. Syst. Arch. 60(3), 271–279 (2014)NVIDIA GRID Technology. http://www.nvidia.com/object/grid-technology.html (2015). Accessed 19 Oct 2015Song, J., et al: KVMGT: a full GPU virtualization solution. In: KVM Forum (2014)AMD Multiuser GPU, Hardware-Based Virtualized Solution. http://www.amd.com/Documents/Multiuser-GPU-Datasheet.pdf (2015). Accessed 19 Oct 2015V-GPU: GPU Virtualization. https://github.com/zillians/platform_manifest_vgpu (2015). Accessed 19 Oct 2015Oikawa, M., Kawai, A., Nomura, K., Yasuoka, K., Yoshikawa, K., Narumi, T.: DS-CUDA: a middleware to use many GPUs in the cloud environment. In: Proceedings of the SC Companion: High Performance Computing, Networking Storage and Analysis, SCC, pp. 1207–1214 (2012)Reaño, C., Silla, F., Shainer, G., Schultz, S.: Local and remote GPUs perform similar with EDR 100G InfiniBand. In: Proceedings of the Industrial Track of the 16th International Middleware Conference, ACM, Middleware Industry ’15, pp. 4:1–4:7 (2015)Reaño, C., Silla, F., Duato, J.: Enhancing the rCUDA remote GPU virtualization framework: from a prototype to a production solution. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, IEEE Press, CCGrid ’17, pp. 695–698 (2017)Shi, L., Chen, H., Sun, J.: vCUDA: GPU accelerated high performance computing in virtual machines. In: Proceedings of the IEEE Parallel and Distributed Processing Symposium, IPDPS, pp. 1–11 (2009)Liang, T.Y., Chang, Y.W.: GridCuda: A grid-enabled CUDA programming toolkit. In: Proceedings of the IEEE Advanced Information Networking and Applications Workshops, WAINA, pp. 141–146 (2011)Giunta, G., Montella, R., Agrillo, G., Coviello, G.: A GPGPU transparent virtualization component for high performance computing clouds. In: Proceedings of the Euro-Par Parallel Processing, Euro-Par, pp. 379–391 (2010)Gupta, V., Gavrilovska, A., Schwan, K., Kharche, H., Tolia, N., Talwar, V., Ranganathan, P. GViM: GPU-accelerated virtual machines. In: Proceedings of the ACM Workshop on System-level Virtualization for High Performance Computing, HPCVirt, pp. 17–24 (2009)Merritt, A.M., Gupta, V., Verma, A., Gavrilovska, A., Schwan, K.: Shadowfax: scaling in heterogeneous cluster systems via GPGPU assemblies. In: Proceedings of the International Workshop on Virtualization Technologies in Distributed Computing, VTDC, pp. 3–10 (2011)Shadowfax II—Scalable Implementation of GPGPU Assemblies. http://keeneland.gatech.edu/software/keeneland/kidron (2015). Accessed 19 Oct 2015Walters, J.P., Younge, A.J., Kang, D.I., Yao, K.T., Kang, M., Crago, S.P., Fox, G.C.: GPU-passthrough performance: a comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL applications. In: Proceedings of the IEEE International Conference on Cloud Computing, CLOUD (2014)Yang, C.T., Wang, H.Y., Ou, W.S., Liu, Y.T., Hsu, C.H.: On implementation of GPU virtualization using PCI pass-through. In: Proceedings of the IEEE Cloud Computing Technology and Science, CloudCom, pp. 711–716 (2012)Jo, H., Jeong, J., Lee, M., Choi, D.H.: Exploiting GPUs in virtual machine for BioCloud. BioMed Res. Int. 2013, 11 (2013). https://doi.org/10.1155/2013/939460NVIDIA: CUDA C Programming Guide 7.5. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf (2015a). Accessed 19 Oct 2015NVIDIA: CUDA Runtime API Reference Manual 7.5. http://docs.nvidia.com/cuda/pdf/CUDA_Runtime_API.pdf (2015b). Accessed 19 Oct 2015NVIDIA: The NVIDIA GPU Computing SDK Version 5.5 (2013)iperf3: A TCP, UDP, and SCTP Network Bandwidth Measurement Tool. https://github.com/esnet/iperf (2015). Accessed 19 Oct 2015Reaño, C., Silla, F.: Reducing the performance gap of remote GPU virtualization with InfiniBand Connect-IB. In: 2016 IEEE Symposium on Computers and Communication (ISCC), pp. 920–925 (2016)Mellanox: Connect-IB Single and Dual QSFP+ Port PCI Express Gen3 x16 Adapter Card User Manual. http://www.mellanox.com/related-docs/user_manuals/Connect-IB_Single_and_Dual_QSFP+_Port_PCI_Express_Gen3_%20x16_Adapter_Card_User_Manual.pdf (2014a). Accessed 19 Oct 2015Mellanox: ConnectX-3 VPI Single and Dual QSFP+ Port Adapter Card User Manual 1.7. http://www.mellanox.com/related-docs/user_manuals/ConnectX-3_VPI_Single_and_Dual_QSFP_Port_Adapter_Card_User_Manual.pdf (2013). Accessed 19 Oct 2015Pérez, F., Reaño, C., Silla, F.: Providing CUDA acceleration to KVM virtual machines in InfiniBand clusters with rCUDA. In: 16th International Conference Distributed Applications and Interoperable Systems (DAIS), pp. 82–95. Springer International Publishing (2016)Mellanox: Mellanox OFED for Linux User Manual. http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v2.3-1.0.1.pdf (2014b). Accessed 19 Oct 2015Reaño, C., Mayo, R., Quintana-Ortí, E., Silla, F., Duato, J., Peña, A.: Influence of InfiniBand FDR on the performance of remote GPU virtualization. In: Proceedings of the IEEE International Conference on Cluster Computing, CLUSTER, pp. 1–8 (2013)Laboratories, S.N.: LAMMPS Molecular Dynamics Simulator. http://lammps.sandia.gov/ (2013). Accessed 19 Oct 2015Liu, Y., Schmidt, B., Liu, W., Maskell, D.L.: CUDA-MEME: accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units. Pattern Recognit. Lett. 31(14), 2170–2177 (2010)Liu, Y., Wirawan, A., Schmidt, B.: CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformat. 14(1), 1–10 (2013)Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)NVIDIA: NVIDIA Popular GPU-Accelerated Applications Catalog. http://www.nvidia.com/content/gpu-applications/PDF/GPU-apps-catalog-mar2015.pdf (2015c). Accessed 19 Oct 2015Liu, Y. CUDA-MEME. https://sites.google.com/site/yongchaosoftware/mcuda-meme (2014). Accessed 19 Oct 2015Polak, A.: Counting triangles in large graphs on GPU. In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 740–746 (2016)Prades, J., Silla, F.: Turning GPUs into floating devices over the cluster: the Beauty of GPU Migration. In: Proceedings of the 6th Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications (HUCAA) (2017

    Accelerator Virtualization in Fog Computing: Moving from the Cloud to the Edge

    Full text link
    [EN] Hardware accelerators are available on the cloud for enhanced analytics. Next-generation clouds aim to bring enhanced analytics using accelerators closer to user devices at the edge of the network for improving quality of service (QoS) by minimizing end-to-end latencies and response times. The collective computing model that utilizes resources at the cloud-edge continuum in a multi-tier hierarchy comprising the cloud, edge, and user devices is referred to as fog computing. This article identifies challenges and opportunities in making accelerators accessible at the edge. A holistic view of the fog architecture is key to pursuing meaningful research in this area.Varghese, B.; Reaño González, C.; Silla Jiménez, F. (2018). Accelerator Virtualization in Fog Computing: Moving from the Cloud to the Edge. IEEE Cloud Computing. 5(6):28-37. https://doi.org/10.1109/MCC.2018.064181118S28375
    • …
    corecore